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00 ■ Abstract: In this paper we consider a simple model of random graph process with hard 

(N 

^Z I copying as follows: At each time step t, with probability < a < 1 a new vertex vt is 

^^ , added and m edges incident with vt are added in the manner of preferential attachment; or 

^D ■ with probability 1 — a an existing vertex is copied uniformly at random. In this way, while 



a vertex with large degree is copied, the number of added edges is its degree and thus the 
number of added edges is not upper bounded. We prove that, in the case of a being large 
enough, the model possesses a mean degree sequence as d^ ~ Cfc~'^+-^"), where d^ is the 
limit mean proportion of vertices of degree k. 
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1 Introduction and the statement of the main result 

Real-world networks such as economic companies, biological oscillators, social networks, 
and the World Wild Web (internet) etc. can be modeled by random complex graphs 
[3 ESI [iBl [13 [IHl [22]. By studying random complex graphs, various topological properties 
such as degree-distribution [6l [HJ [121 [E]; diameter [H [3l [10], clustering (Qj [18], stability 
[H M, [H] and spectral gap [2J of these real-world networks have been presented. One of the 
most basic properties of many real-world networks is concerned with the power law degree 
distributions. As indicated in ^, the emergence of the power law degree distributions 
should be a consequence of two generic mechanisms: 

1. Evolution: new vertices and edges are added continuously, and 

2. Preferential attachment: new vertices are preferentially attached to vertices that are 
already well connected. 

The above mechanisms are referred to as BA mechanisms. Besides the original model pro- 
posed in [6], many other models with the BA mechanisms have been introduced and aimed 
to explain the underlying causes for the emergence of the power law degree distributions. 
This can be observed in 'LCD model' [10], the generalization of 'LCD model' due to Buck- 
ley and Osthus [8J, the very general models defined by Copper and Frieze [13], Copper, 
Frieze and Vera [I^ etc. 

Copying is another mechanism that may be observed in real-world networks. The basic 
idea of copying comes from the fact that a new web page is often made by copying an old 
one. A kind of copying models was proposed in Kumar et al. [15] to explain the emergence 
of the degree power laws in the web graphs. These models are parameterized by a copy 
factor a G (0, 1) and a constant out-degree d > 1. At each time step, one vertex u is 
added and d out-links are generated for u as follows. First, an existing vertex p is chosen 
uniformly at random; then with probability 1 — a the i"^ out-link of p is taken to be the i^^ 
out-link of M, and with probability a a vertex is chosen from the existing vertices uniformly 
at random to be the destination of the i^^ out-link of -u. It is proved in [15] that the above 
copying models possess a power law degree sequence as d^ ~ C/c~^^~"^/^^~"^. 



In this paper we will introduce and study a new copying model created by lazy copiers. 
Our copiers are so lazy that the only thing they want to do is copying. However, the copiers 
corresponding to the copying action discussed in [I5] should be more clever and diligent: 
for the chosen vertex p, they have to distinguish which link be a original out-link of p first 
and then decide whether or not to copy it. 

Let's consider the following random process Gt, t = 2, 3, ■ ■ ■ . Assume that graph 
Gt = (yt,Et) and t = \Vt\, Ct = \Et\ (In order to simplify the statement and the proof of 
our main result, technically, we start our process at time step 2). 

Time-Step 2: To begin the process, we start with G2 consisting of vertices Vi, V2 and 
2m multi-edges between them. 

Time-Steps t > 3: 

• With probability a > we add a new vertex Vt to Gj_i and then add m random edges 
incident with Vt- The m random neighbors Wi, W2, ■ ■ ■ , Wm are chosen independently 
and for any 1 < i < m, w E Vt-i, 

nw^ = w) = ^^f^^, (1.1) 

where dy^{t — 1) denotes the degree of vertex w in Gt-i- Thus neighbors are chosen 
by preferential attachment. 

• With probability 1 — a we generate vertex Vt by copying a existing vertex Vi, 1 < i < 
t — 1 from Vt-i uniformly at random. Note that in this case, all neighbors of Vt are 
those of the copied vertex f j. 

As defined above, our copying is executed in a direct and simple way, which is referred to 
as hard copying here. With hard copying, ej may increase nonlinearly, this makes bounding 
Ct a rather hard problem. 

Now, Let Dk{t) be the number of vertices with degree k > in Gt and let Dk(t) be the 
expectation of Dk{t). The main result of this paper follow as: 



Theorem 1.1 Assume that 2m(l — a)<a. Then, for all k > 0, the limit dk = lim 

i^oo t 

exists and satisfies 

4 = 0, < A; < m; dm = —7-; dk = TT ( 1 + . , „ ] dm, V k > m. 

i=m+l ^ ' 

Obviously, d^ ~ Ck^^^^"^"^ for some constant C . 



We follow the basic procedures in [13] and [M] to prove our main theorem. The rest of 
the paper is organized as follows. In Section 2, we bound the maximum degree and then 
bound Ct, the number of edges in Gt- In Section 3, using the estimates given in Section 2, 
we establish the recurrence for Dk{t). Finally, in section 4, we derive the approximation 
of Dk{t) by a recurrence with respect to k and then solve the recurrence in k to finish the 
proof of Theorem II. 1[ 

Here we note that although this paper focuses on the power law degree distributions, 
other degree distributions including the exponential degree distributions of random graph 
process have also been observed [3l [TJ [161 122] . Furthermore, phase transition may emerge in 
the degree distributions of random graph processes [201 [21]. The phase transition problem 
of the copying model proposed in this paper is left to future investigation. 

2 Bounding the degree and the number of edges 

In this section, we first bound the maximum degree in Gt and then bound e^. Actually, we 
will give four kinds of estimates to e^, as will be seen in section 3, the four estimates are 
all necessary for establishing the recurrence of Dk{t). 

For t > 2, let V^° be set of original vertices in VJ, namely 

V° := {v EVt : V = vi, f 2 or f is added as a new vertex at some time step 3 < s < t}. 

For any times s and t with 3 < s < t, if f^ G V^", then, 

<(s) = ^4,(2) = ^4.(2) = m. (2.1) 



We say an event happens quite surely (qs) if the probabihty of the comphmentary set 
of the event is 0{t~^) for any i^ > 0. 

We bound the degree in Gt from top as follows 

Lemma 2.1 Assume that 2m{l — a) < 1 and Vg G V°. Then 

d.At)<{t/sr/'^"'^'"'H^ogtf qs. (2.2) 

Proof. Let Y be the {0, l}-valued random variable with F{Y = 1) = a = 1 —F{Y = 0). 
Then using the fact that et > mt, we have 

E«(t + 1) I Gt) < <(t) + YB (m, ^) + (1 - Y)mB (^1, ^^ , (2.3) 

where B{-, ■) be the general Binomial random variable. 

Using the fact (12. ip and the relation (12.31) . Lemma [2.11 follows from the same argument 
as used in [131, HH and [20]. D 



For any v & Vt, ii v is copied at time step s from some vertex u,., 1 < r < s — 1, we call 
V the daughter vertex of Vr and call Vr the mother vertex of v. Denote by D{v, Gt) the set 
of all descendants of v in Gt. By the definition of the model, we know that, for any Vg G V° 
and V G D{vs,Gt), d^it) is same distributed as dy^{t). Now, denote by At the maximum 
degree in Gt, then, by Lemma [2. II and the above analysis, we have 

A(<r/2+m(l-a)QQg^^3^ gs. (2.4) 

For any Vs G Vf, let fv^it) = \D{vs,Gt)\ be the number of all descendants of Vg, then, 
we have 

Lemma 2.2 For any s > 1, ifvg is a original vertex, i.e., for some t > 2, f^ G Vf, then 

f.M<it/s)'-'^{logt)\ qs. (2.5) 



Proof. Let Y be the random variable used in the proof of Lemma 12.11 then, 

E(^ it + 1) I Gt) = /., (t) + (1 - r )i? (l, ^) . (2.6) 



The Lemma follows from the relation fl2.6p and the same argument as used in Lemma 12.11 
D 

Now we begin to bound et, the number of edges in Gt- Let at be the number of edges 
added at time step t + 1, i.e., e^+i = at + et- By the definition of the model, we have 
at < inax{At,m} = At, V t > 2; on the other hand, noticing that the number of multi- 
edges between any given vertices pair is fewer than 2m, we have 

A2 = 2m, At+i < At + 2m, V t > 2. 

This gives the following determined upper bound on et, 

t-i t-i 

et = 2m + ^ Os < 2m + ^ 2m(s - 1) = 0{f). (2.7) 

s=2 s=2 

For random upper bounds on et, firstly, we prove a crude one as 

et < O {tihgtf) , qs. (2.8) 

Indeed, we have 

t 

2et = J2<it)= Yl E ^-W- 

s=l Vs(^Vt° veD(vs,Gt) 

By Lemma [2.11 and Lemma [2.21 

E E d.it) < E [(t/s)"/^+(-+^)(^-<^) (logt)^] = O (tilogtr) , qs. 

Vs£Vfv£D(v,,Gt) s=l 

Note that for the last equality we have used the condition 2m(l — a) < a, which is given 
in the statement of Theorem II. 1[ 

Secondly, we try to give an estimate to E(et), the expectation of the number of edges 
in Gt- By the definition of the model, we have 

E(et+i|Gt)=et + am+(l-«)^, (2.9) 



so 



E(et+i) = E(et) (l + ^^^y-^) + «^- (2-10) 



Let 

Tit ■= et - /it, 

where /i = — -. Then, ( I2.10p imphes that 

1 — 2(1 — a) 

2(1 -a] 



E(r/,+i) = E(r7i) 1 + 



t 
Thus, E{r]t) = 0(^2(1-")) and we have 

E(et)= /it + 0(^2(1-)). (2.11) 

Finally, we have the following probability estimate on et as 

Lemma 2.3 Assume that 2m(l — a) < 1. Take £0 > such that 1 + 2£:o + 2m(l — a) < 2, 
then 

P (\et - /it| > t5+^o+m(i-a)A = Oit-'"). (2.12) 

Proof. To get the estimate (I2.12p . we have to bound Var(et), the variance of et. First 
of all, we have 

Var(ei+i) = Var(ai + et) = Var(ei) + Var(ai) + 2 {E{atet) - E{at)E{et)) . (2.13) 

By definition, we have 

E(a?|Gi) = am2 + (l-«)^'^'»^^^ 



..1 ' 



Then, by Lemma [2.11 and Lemma [2. 2[ 



vsev° veD{vs,Gt) 



'1 — a 



< am" + ^- — ^ E [(t/s)"+''"('-"Hlog^)'] [(t/s)^-"(logt)3] + 0{t-^^) 



t 



s=l 



0(t2-(i-")(logt)9). (2.14) 



In addition, by fl2.9p and fl2.1ip . we have 

E{at) = am + 2(1 - a)fi + 0{t^^^'''^'^). (2.15) 

Thus 

Var(ai) = O {f"''^^-''\logt)^) . (2.16) 

For the term E(atef), using ( 12. 9p . it is clear that 

E{atet\Gt) = etE{at\Gt) = e* (ma + 2(1 - a)y) , 

then 

E(atet) = maE(et) + ^ ~"^ E(eg). (2.17) 

Using (12.91) again, we have 

E(ai)E(e,) = maE{e,) + ^ii^E(e,)'. (2.18) 

Substituting fl236D . (12X7]) and fl238D into fl27[3D . we get 

Var(e,+i) = ("l + Ki^'j Var(e,) + O (t2-(^-")(logt)9) 

= (l + Kii:i!l'j Var(ei) + O (t2™(i-")+eo) ^ (2.19) 

where eo > is given in the statement of the Lemma. The recurrence (I2.19P can be solved 
directly to get 



v^.(..^nO.^)H.o(|^^^j:S 



-£() 



- A vt^n;.3(i+4(i-«)/j), 

for large t, this implies that 

Var(ei) = O (t^+^rn(i-a)+eo^ _ (2.20) 

The Lemma follows immediately from (12.111) . (I2.20p and the Chebychev's inequality. D 



3 Establishing The Recurrence for Dk{t) 

Before we establish the recurrence for Dk{t), we have to bound the multi-edges first. For 
t > 2, let 

Zt = {v E Vt : 3 u E Vt s.t. there are multi-edges between u and v} 

and Xt = \Zt\, the cardinality of random set Zt. Clearly, the number of multi-edges in Gt 
is less than 2mXt. 

Lemma 3.1 For any e > 0, we have 

E{Xt) = O (t"/2+™{i-")+^) . (3.1) 

Proof. By the definition of the model, we have 

Xt , fm\At 



E{Xt+i\Gt)<Xt + {l-a)^ + a 
Taking expectation and then using (12.41) and the fact that Ct > mt, we have 
E(X,+i) <(l + i^") E(X,) + O (t"/2+-(i-)-i(logt)3) 

= (l + ^—^^ E{Xt) + O (t"/2+-(i--)-i+e) _ (3.2) 

Using the argument between (12.191) and (12.201) . the Lemma follows immediately from (13. 2p . 
D 

Now, we try to estabhsh the recurrence for Dk{t). Put Dk{t) = 0, < /c < m, for all 
t >2. For k > m, we have 

D,{t + 1) = D,{t) + amE [-^^ + ^^ O ^- 

Dkit) , Dfe_i(t) ^ fXt 



+ (1 -a){k- 1)E ( ^ + -^^ - O ( ^ 1 1 + ah=m. (3.3) 

The terms O I — ) and O I — ) account for the probabilities that we create more than 
one degree changes due to new vertex addition and vertex copying from Zt respectively. 



By Le,„,„aE31 the tenn E l'^ 1 ca„ be expressed as 



2e, 



E 



kDk{t) 



+E 



2et 
kDk{t) 



|et 



lit\ < ti/2+^o+m(i-«)'j p (|e^ _ ^^1 < ^1 



/2+eo+ni(l-a)^ 



fklMt)_ 1^^ _ ^^1 ^ ^l/2+eo+m(l-.)^ p (|g^ _ ^^\ > ^l/2+.o+m(l-a)) 



E (A;Dfc(t)| ICi - /it| < il/2+£0+m(l-a)j P (|ei - /it| < t^ 



/2+eo+m(l-a) 



) 



2/it 

X (1 + O (t-l/2+eo+m(l-a)^)^ ^ O(t^"0), (3.4) 

where we used the fact that kDk{t) < 2et to hand the second term. In addition, we have 

E {kDkit)\ \et - /it| < f/^+eo+m{l-a)^^ p ^|g^ _ ^^| < ^l/2+eo+m(l-a)^ 
= kDkit) - E{kDk{ty, \et - ^it\ > tl/2+.o+m(l-a))^ 



(3.5) 



and 



E{kDkit); \et - fit\ > t^^+^o+mii-a)^ 
= E{kDk{t); \et - fit\ > ti/2+eo+m(i-a)^ ^^ < o(t(logt)^)) 
+EikDkit); \et - fit\ > t^/^+^o+mH-a) ^ ^^ > 0(t(logt)^)) 



<0{t{logtf)F{\et-^it\>t 



+Oif)net>0{t{logtf)) 



l/2+£o+m{l-a) 



< 0{t'-'"{logtY) + 0{r''') = Oit'-'"{logtf). (3.6) 

Note that to get (13.61) . we used the fact that kDk{t) < 2et and the bounds on et given in 
(EZD and I^B). 

Thus, combining (13. 4p . (13.51) and (13.61) . 
'kDkit)\ kDkit) 



E 



2e. 



< 



(1+0 (t-l/2+^o+m(l-a)^)^) ^ 0{t-'\\0gtf) 

O (t-l/2+eo+m(l-a)) ^_ 0(t-=»(logt)^), 



2/it 
A;:Dfc(t) , E(2ei) 



2/it 2/it 
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using (12.111) . we have for k > m 

^ rkDk(fy\ ^ kDk{t)_ ^ ^ .^_i/2+,o+Mi-a)) + 0{t-'°(\ogt)^). (3.7) 

V 2et / 2/it ^ 

On the other hand, by inequahty (12. 4p and Lemma 13.11 for any fixed e G (0,1 — a/2 — 
m{l — a)), we have 



^'t)-<T 



0{t 



-l+a/2+m(l-a)+e^ 



Let 



El = - min {^o, 1 — tt/2 — m(l — a), 1/2 — ^o ~ "^(1 ~" Q;)} 



Now, substitute (13.71) and (13.81) into (13. 3p . we get the recurrence for Dk{t) as 

+0{r'') + alk=m, W k>m. 
Note that the hidden constant, denote by L, in term 0(t~^^) is independent of k. 



(3.1 



(3.9) 



(3.10) 



4 Solving (13.101) and The Proof Theorem 11.1 



In recurrence (I3.10p . if we heuristically put dk 



Dk{t) 



t 



and assume it is a constant, we get 



{k + 2a)j {k-l)j 

^ 4 = ^ 4^1 + O(t-^i) + ah=m. 

This leads to the consideration of the following recurrence in k: 

(k + 2a) , (k-1) , 

dk = — - — rffc-i + alk=m, k>m; 

dk = 0, < k < m. 
The following Lemma shows that (14.10 is a good approximation to (13.101) . 



(4.1) 
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Lemma 4.1 Suppose that {dk : k > 0} be the solution of ( [^.i[ ), then there exists a constant 
M > such that 

\Dk{t)-tdk\<Mt'-'\ (4.2) 

for all t > 1 and k > 0, where e\ is given in \3.9^ . 

2oL 
Proof. The recurrence can be solved directly as: d^ = 0, < k < m; dm = and 

m + 2a 

Obviously, d^ decay as k~^^~^'^°'\ consequently, for some constant C, 

dk < C/k for all k > I. (4.4) 

Using (14.41) and the degree estimate given in Lemma [2?T| the Lemma follows from a standard 
argument which can be found in [H] (see Lemma 5.1) and [20] (see Lemma 3.1). D 

Proof of Theorem \l.l\ Theorem 11.11 follows immediately from (14.21) and (14. 3p . D 
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